Categories

Versions

Random Forest Encoder (Operator Toolbox)

Synopsis

This operator applies a Random Forest model on a data set. The difference between this operator and the usual Apply Model operator is that this does not create confidences and predictions but rather the confidence for the positive class for each individual tree in the forest. The result is an ExampleSet with X new attributes (where X = number of trees) called score_X. This can be used as an encoder. One application for this is to build a more sophisticated voting model than the typical voting (average) by training another learner on the results. Another use case is to encode nominal features into numerical ones.

This operator also provides a preprocessing model. This preprocessing model can be grouped with any subsequent model to be applied after another.

Input

  • exa (Data Table)

    Input ExampleSet which should be encoded.

  • mod (Random Forest Model)

    Random Forest model which is used for encoding.

Output

  • exa (Data Table)

    The ExampleSet with the result of the application.

  • mod

    The passed through Random Forest model.

  • pre

    A preprocessing model which can be used to apply the same transformation to another data set. This can also be used with the Group Models operator.

Parameters

  • remove_original_attributes If checked all original attributes are removed from the resulting ExampleSet and only the encoding attributes are kept. Range:

Tutorial Processes

Encode Sonar

In this process we read the sonar data set and encode it with a Random Forest with 10 trees. The result is a new ExampleSet containing 10 scores but not the individual attributes anymore.

Encode Golf and use Logistic Regression

In this process we read the Golf data set and create a cross-validated classification model on it. We encode our data with a Random Forest and use a Logistic Regression on the encoded results. The models are grouped together using the preprocessing model provided by the Random Forest Encoder.